semantic index
Scientific Paper Retrieval with LLM-Guided Semantic-Based Ranking
Zhang, Yunyi, Yang, Ruozhen, Jiao, Siqi, Kang, SeongKu, Han, Jiawei
Scientific paper retrieval is essential for supporting literature discovery and research. While dense retrieval methods demonstrate effectiveness in general-purpose tasks, they often fail to capture fine-grained scientific concepts that are essential for accurate understanding of scientific queries. Recent studies also use large language models (LLMs) for query understanding; however, these methods often lack grounding in corpus-specific knowledge and may generate unreliable or unfaithful content. To overcome these limitations, we propose SemRank, an effective and efficient paper retrieval framework that combines LLM-guided query understanding with a concept-based semantic index. Each paper is indexed using multi-granular scientific concepts, including general research topics and detailed key phrases. At query time, an LLM identifies core concepts derived from the corpus to explicitly capture the query's information need. These identified concepts enable precise semantic matching, significantly enhancing retrieval accuracy. Experiments show that SemRank consistently improves the performance of various base retrievers, surpasses strong existing LLM-based baselines, and remains highly efficient.
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
A Semantic Search Pipeline for Causality-driven Adhoc Information Retrieval
Dalal, Dhairya, Gupta, Sharmi Dev, Binaei, Bentolhoda
We present a unsupervised semantic search pipeline for the Causality-driven Adhoc Information Retrieval (CAIR-2021) shared task. The CAIR shared task expands traditional information retrieval to support the retrieval of documents containing the likely causes of a query event. A successful system must be able to distinguish between topical documents and documents containing causal descriptions of events that are causally related to the query event. Our approach involves aggregating results from multiple query strategies over a semantic and lexical index. The proposed approach leads the CAIR-2021 leaderboard and outperformed both traditional IR and pure semantic embedding-based approaches.
- Asia > India (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Ireland (0.04)
- (2 more...)
Unleash LLMs Potential for Recommendation by Coordinating Twin-Tower Dynamic Semantic Token Generator
Yin, Jun, Zeng, Zhengxin, Li, Mingzheng, Yan, Hao, Li, Chaozhuo, Han, Weihao, Zhang, Jianjin, Liu, Ruochen, Sun, Allen, Deng, Denvy, Sun, Feng, Zhang, Qi, Pan, Shirui, Wang, Senzhang
Owing to the unprecedented capability in semantic understanding and logical reasoning, the pre-trained large language models (LLMs) have shown fantastic potential in developing the next-generation recommender systems (RSs). However, the static index paradigm adopted by current methods greatly restricts the utilization of LLMs capacity for recommendation, leading to not only the insufficient alignment between semantic and collaborative knowledge, but also the neglect of high-order user-item interaction patterns. In this paper, we propose Twin-Tower Dynamic Semantic Recommender (TTDS), the first generative RS which adopts dynamic semantic index paradigm, targeting at resolving the above problems simultaneously. To be more specific, we for the first time contrive a dynamic knowledge fusion framework which integrates a twin-tower semantic token generator into the LLM-based recommender, hierarchically allocating meaningful semantic index for items and users, and accordingly predicting the semantic index of target item. Furthermore, a dual-modality variational auto-encoder is proposed to facilitate multi-grained alignment between semantic and collaborative knowledge. Eventually, a series of novel tuning tasks specially customized for capturing high-order user-item interaction patterns are proposed to take advantages of user historical behavior. Extensive experiments across three public datasets demonstrate the superiority of the proposed methodology in developing LLM-based generative RSs. The proposed TTDS recommender achieves an average improvement of 19.41% in Hit-Rate and 20.84% in NDCG metric, compared with the leading baseline methods.
- Asia > China > Beijing > Beijing (0.05)
- Asia > China > Hunan Province (0.04)
- Oceania > Australia > Queensland > Brisbane (0.04)
- North America > United States > New York > New York County > New York City (0.04)
Distantly Supervised Morpho-Syntactic Model for Relation Extraction
Gutehrlé, Nicolas, Atanassova, Iana
The task of Information Extraction (IE) involves automatically converting unstructured textual content into structured data. Most research in this field concentrates on extracting all facts or a specific set of relationships from documents. In this paper, we present a method for the extraction and categorisation of an unrestricted set of relationships from text. Our method relies on morpho-syntactic extraction patterns obtained by a distant supervision method, and creates Syntactic and Semantic Indices to extract and classify candidate graphs. We evaluate our approach on six datasets built on Wikidata and Wikipedia. The evaluation shows that our approach can achieve Precision scores of up to 0.85, but with lower Recall and F1 scores. Our approach allows to quickly create rule-based systems for Information Extraction and to build annotated datasets to train machine-learning and deep-learning based classifiers.
- North America > United States > Washington > King County > Seattle (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
- (24 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Microsoft: Copilot AI helps you skip meetings, zoom through email
The AI-powered Microsoft 365 Copilot could allow you to skip or even double-book meetings without missing out on what was discussed, "hopscotch" through priority email, and more. Microsoft 365 Copilot, announced in March, unfortunately remains in preview. Microsoft is confident enough of what it can do, though, that it said today that it's charging 600 worldwide customers to try it out as part of a Microsoft 365 Copilot Early Access Pass. In March, corporate vice president Jared Spataro said that Copilot would come to basically all Microsoft 365 apps: Word, PowerPoint, Excel, Teams and more. The company released a number of video demonstrations of how Microsoft 365 Copilot will work in its various apps.
Detecting Policy Preferences and Dynamics in the UN General Debate with Neural Word Embeddings
Gurciullo, Stefano, Mikhaylov, Slava
Foreign policy analysis has been struggling to find ways to measure policy preferences and paradigm shifts in international political systems. This paper presents a novel, potential solution to this challenge, through the application of a neural word embedding (Word2vec) model on a dataset featuring speeches by heads of state or government in the United Nations General Debate. The paper provides three key contributions based on the output of the Word2vec model. First, it presents a set of policy attention indices, synthesizing the semantic proximity of political speeches to specific policy themes. Second, it introduces country-specific semantic centrality indices, based on topological analyses of countries' semantic positions with respect to each other. Third, it tests the hypothesis that there exists a statistical relation between the semantic content of political speeches and UN voting behavior, falsifying it and suggesting that political speeches contain information of different nature then the one behind voting outcomes. The paper concludes with a discussion of the practical use of its results and consequences for foreign policy analysis, public accountability, and transparency.
- North America > United States (0.47)
- Asia > Russia (0.29)
- Europe > Russia (0.15)
- (27 more...)
- Government > Foreign Policy (1.00)
- Government > Military (0.69)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)